Multi-Topic Multi-Document Summarization

نویسندگان

  • Masao Utiyama
  • Kôiti Hasida
چکیده

Summarization of multiple documents featuring multiple topics is discussed. The example trea.ted here consists of fifty articles about the Peru hostage incident tbr ])ecember 1996 through April 1997. They include a. lot of topics such as opening, negotiation, ending, and so on. The method proposed in this paper is based on spreading activation over documents syntactically and semantically annotated with GI)A (Global l )ocument Annotation) tags. The method extracts important documents aald impor tant parts therein, and creates a network consisting of important entities and relations among them. It also identifies cross-document coreferences to replace expressions with more concrete ones. The method is essentially multi~ lingua] due to the language-independence of the GDA tagset. This tagset can provide a standard fornm.t tbr the s tudy on the transfbrmation and /or generation stage of summarizat ion process, among other natural language processing tasks. 1 I n t r o d u c t i o n A large ('.vent consists of a, number of smaller events. These component events are usually related trot such relations may not be strong enough to define larger topics. For example, a war may consist of opening, battles, negotiations, and so on. These relatively independent events are considered to be topics by themselves and would accordingly be reported in multiple news re'titles. Summarization of such a large event, or multiple documents about multiple topics, is the concern of this paper. Summarization of multiple documents containing nmltiple topics is an unexplored research issue. Some previous studies on summarization (McKeown and Radev, 1995; Barzilay et al., 1999; Mani and Bloedorn, 1999) deal with multiple docmnents about a single topic, but not about multiple topics 1. In order to smnmarize lnultiple docmne, nts with multiple topics, one needs a general, semantics-oriented method for evaluating importance. Summarization of a single document may largely exploit the doculnent structure. As an extreme example, the first paragraph of a newspaper article often serves as a smmnary of the entire article. On the other hand, summa.rization of multiple, documents in general must be more based on their semantic structures, because the, re is no overall consistent document structure across them. Selection of multiple important topics (not keywords) tbr nmltiple-topic summarization has not; yet been really addressed in the previous literatm:e. The present paper proposes a method, based on spreading a.ctivation, for extracting important topics and important documents. Another method proposed which is usefifl for grasping the overview of nlultiple documents is visualization of important entities mentioned and relationships among them. Visualization of relationships among keywords has been studied in the context of information retrieval (Niwa et al., 1997; Sanderson and Croft, ] 999), but to the authors' knowledge the present s tudy is the first to address such visualization in the context of summm'ization. Of conrse a. concise summary of the entire set of multiple doculnents can be obtained by recovering sentences from important entities and their relationships ~s demonstrated in section 3.3. The present s tudy assumes documents annota ted with GDA (Global Document Annota1Maybury (1999) discusses smnmarlzation of multiple topics, but in his study the smmnaries are made ffonl an event database lint not fl'om documents.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-Document Summarization Using Document Set Type Classification

In this paper, we propose a summarization system which automatically classifies type of document set and summarizes a document set with its appropriate summarization mechanism. This system will classify a document set into three types: (a) One topic type, (b) multi-topic type, and (c) others. These types will be identified using information of high frequency nouns and Named Entity. In our multi...

متن کامل

Multi-Document Summarization using Sentence-based Topic Models

Most of the existing multi-document summarization methods decompose the documents into sentences and work directly in the sentence space using a term-sentence matrix. However, the knowledge on the document side, i.e. the topics embedded in the documents, can help the context understanding and guide the sentence selection in the summarization procedure. In this paper, we propose a new Bayesian s...

متن کامل

Query-focused Multi-Document Summarization: Combining a Topic Model with Graph-based Semi-supervised Learning

Graph-based learning algorithms have been shown to be an effective approach for query-focused multi-document summarization (MDS). In this paper, we extend the standard graph ranking algorithm by proposing a two-layer (i.e. sentence layer and topic layer) graph-based semi-supervised learning approach based on topic modeling techniques. Experimental results on TAC datasets show that by considerin...

متن کامل

Graph-Based Multi-Modality Learning for Topic-Focused Multi-Document Summarization

Graph-based manifold-ranking methods have been successfully applied to topic-focused multi-document summarization. This paper further proposes to use the multi-modality manifold-ranking algorithm for extracting topic-focused summary from multiple documents by considering the within-document sentence relationships and the cross-document sentence relationships as two separate modalities (graphs)....

متن کامل

A Novel Feature-based Bayesian Model for Query Focused Multi-document Summarization

Supervised learning methods and LDA based topic model have been successfully applied in the field of multi-document summarization. In this paper, we propose a novel supervised approach that can incorporate rich sentence features into Bayesian topic models in a principled way, thus taking advantages of both topic model and feature based supervised learning methods. Experimental results on DUC200...

متن کامل

A Hybrid Topic Model for Multi-Document Summarization

Topic features are useful in improving text summarization. However, independency among topics is a strong restriction on most topic models, and alleviating this restriction can deeply capture text structure. This paper proposes a hybrid topic model to generate multi-document summaries using a combination of the Hidden Topic Markov Model (HTMM), the surface texture model and the topic transition...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000